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Abstract 

We propose and analyze an extremely fast, efficient and simple method for solving the problem: 

min{||u||i : Au = f,u £ R"}. 

This method was first described in 1!], with more details in [2|] and rigorous theory given in [3 and 0|. 
The motivation was compressive sensing, which now has a vast and exciting history, which seems to have 
started with Candes, et.al. [f| and Donoho, See 0], [S| and 0] for a large set of references. Our 
method introduces an improvement called "kicking" of the very efficient method of Q}, Q and also applies 
it to the problem of denoising of undersampled signals. The use of Bregman iteration for denoising of 
images began in [?| and led to improved results for total variation based methods. Here we apply it to 
denoise signals, especially essentially sparse signals, which might even be undersampled. 

1 Introduction 

Let AGR mxn , with n>m and /Gi? m , be given. The aim of a basis pursuit problem is to find uGR n by 
solving the constrained minimization problem: 

mm{J(u)\Au = f} (1.1) 
ueii™ 

where J(u) is a continuous convex function. 
For basis pursuit, we take: 

n 

J(u) = Mi = I>il- (!- 2 ) 

i=i 

We assume that AA T is invertible. Thus Au — f is underdetermined and has at least one solution, u = 
A T (AA T )~ 1 / , which minimizes the £2 norm. We also assume that J(u) is coercive, i.e., whenever — > 
00, J(u) — ?>oo. This implies that the set of all solutions of (jl.ip is nonempty and convex. Finally, when J(u) 
is strictly or strongly convex, the solution of (ll.l[) is unique. 

Basis pursuit arises from many applications. In particular, there has been a recent burst of research 
in compressive sensing, which involves solving (II. ip . (|1.2j) . This was led by Candes et.al. Donoho, [gj], 
and others, see 0, [J and [3| for extensive references. Compressive sensing guarantees, under appropriate 
circumstances, that the solution to (jl.ll) . (|1.2[> gives the sparsest solution satisfying Au = f. The problem 
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then becomes one of solving (jl.ll) . (|1.2I) fast. Conventional linear programming solvers are not tailored for 
the large scale dense matrices A and the sparse solutions u that arise here. To overcome this, a linearized 
Bregman iterative procedure was proposed in [l| and analyzed in Q, 0] and In 0, true, nonlinear 
Bregman iteration was also used quite successfully for this problem. 

Bregman iteration applied to (|1.2p involves solving the constrained optimization problem through 

solving a small number of unconstrained optimization problems: 

n un||i|«|i + i||A«-/[[»J (1.3) 

for n. > 0. 

In [2] we used a method called the fast fixed point continuation solver (FPC) [1| which appears to be 
efficient. Other solvers of (|1.3p could be used in this Bregman iterative regularization procedure. 

Here we will improve and analyze a linearized Bregman iterative regularization procedure, which, in its 
original incarnation, [l[ , Q , involved only a two line code and simple operations and was already extremely 
fast and accurate. 

In addition, we are interested in the denoising properties of Bregman iterative regularization, for signals, 
not images. The results for images involved the BV norm, which we may discretize for nxn pixel images as: 

n-1 

TV(u) = J2 ((^ +lj -^) 2 + K J+1 -z % -) 2 )^ (1-4) 

ij = l 

We usually regard the success of the ROF TV based model [t| 

min{TVH + ^||/- U || 2 } (1.5) 

(we now drop the subscript 2 for the Li norm throughout the paper) as due to the fact that images have 
edges and in fact are almost piecewise constant (with texture added). Therefore, it is not surprising that 
sparse signals could be denoised using () 1 . 3[) . The ROF denoising model was greatly improved in Q and [l(| 
with the help of Bregman iterative regularization. We will do the same thing here using Bregman iteration 
with (|1.3[) to denoise sparse signals, with the added touch of undersampling the noisy signals. 

The paper is organized as follows: In section 2 we describe Bregman iterative algorithms, as well as 
the linearized version. We motivate these methods and describe previously obtained theoretical results. 
In section 3 we introduce an improvement to the linearized version, call "kicking" which greatly speeds 
up the method, especially for solutions u with a large dynamic range. In section 4 we present numerical 
results, including sparse recovery for u having large dynamic range, and the recovery of signals in large 



amounts of noise. In another work in progress Uj we apply these ideas to denoising very blurry and noisy 
signals remarkably well including sparse recovery for u. By blurry we mean situations where A is perhaps a 
subsampled discrete convolution matrix whose elements decay to zero with n, e.g. random rows of a discrete 
Gaussian. 



2 Bregman and Linearized Bregman Iterative Algorithms 

The Bregman distance [12j . based on the convex function J, between points u and v , is defined by 

D p j(u,v) = J(u)- J{v)-(p,u-v) (2.6) 

where p<Ed.J(v) is an element in the subgradient of J at the point v. In general D p (u,v) ^ D p (v,u) and the 
triangle inequality is not satisfied, so D p (u,v) is not a distance in the usual sense. However it does measure 
the closeness between u and v in the sense that D p (u,v) > and D p (u,v) > D P j(w,v) for all points w on the 
line segment connecting u and v. Moreover, if J is convex, D p (u,v) >0, if J is strictly convex D P j(u,v) >0 
for u v and if J is strongly convex, then there exists a constant a > such that 

D p (u,v)>a\\u~v\\ 2 . 
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To solve (jl.ip Bregman iteration was proposed in Q ■ Given u° = p° = 0, we define: 



/ +1 =argmin I J{u) - J(u k ) - {u-u k ,p k ) + \\\Au- f\\ 2 \ (2.7) 



p fc+1 =p fe -A T (>lu fc+1 -/). 

This can be written as 



u k+1 = arg mm <{ Dj" (u,u fc ) + ~ \\Au - 



It was proven in Q that if J(u) £ C 2 (fi) and is strictly convex in f2, then ||ykt fc — /|| decays exponentially 
whenever u k £ f2 for all fc. Furthermore, when u fc converges, its limit is a solution of (jl.ip . It was also proven 
in Q that when J{u) = \u\i, i.e. for problem (jl.ip and (jl.2p . or when J is a convex function satisfying some 
additional conditions, the iteration (|2.7p leads to a solution of ([l.ip in finitely many steps. 

As shown in 0], see also Q, [loj ]. the Bregman iteration (|2.7p can be written as: 

M fe+1 = argmin { J-(u) + ^ || — 1| 2 1 (2.8) 

tiS-R™ [ 2 J 

This was referred to as "adding back the residual" in 0] . Here /° = 0,?i = 0. Thus the Bregman iteration 
uses solutions of the unconstrained problem 

min{j( U ) + i||^-/|| 2 ) (2.9) 

as a solver in which the Bregman iteration applies this process iteratively. 

Since there is generally no explicit expression for the solver of (|2 . T[) or (|2.8p . we turn to iterative methods. 
The linearized Bregman iteration which we will analyze, improve and use here is generated by 

?/ +1 =argmin ( J(u) - J(u k ) - (u-u k ,p k ) + - (u k - SA T (Au k - /))f 
ueR n i 2o 

p k+1 =p k -~(u k+1 -u k ) - A T (Au k - /). (2.10) 
o 

In the special case considered here, where J(u) = ju||u||i, then we have the two line algorithm 

v k+1 =v k -A T (Au k -f) (2.11) 
u fe+1 =(5-shrink(u fc+1 ,^) (2.12) 



where v k is an auxiliary variable 



and 



v k =p k + \u k (2.13) 
o 



!x — /i, if x > fi 
0, if — n<x<^ 
x + fj,, if x < — /i 

is the soft thresholding algorithm • 

This linearized Bregman iterative algorithm was invented in [l[ and used and analyzed in 0| , 0] and 0] ■ 
In fact it comes from the inner-outer iteration for (|2.7p . In Q it was shown that the linearized Bregman 
iteration (|2 . 10|) is just one step of the inner iteration for each outer iteration. Here we repeat the arguments 
also in which begin by summing the second equation in (|2.10|) arriving at (using the fact that u° =p a = 0): 

p k + \u k + E-In AT ( Au1 - /) = P fe + - = 0, for fc = 1, 2, . . . . 

J 
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This gives us (I2.12p . and allows us to rewrite its first equation as: 

u fe+1 = argmin ( J{u) + ^-\\u-8v k+1 \\ 2 1 (2.14) 

uG-R" [ 25 J 

i.e. we are adding back the "linearized noise", where v k+1 is defined in (|2.11[) . 

In [2| and [3J some interesting analysis was done for (|2.10[) . (and some for (|2.14ll ). This was done first for 
J(u) continuously diffcrcntiablc in (|2.10l) and the gradient dJ(u) satisfying 

\\dJ(u)-dJ{v)\\ 2 <f3(dJ{u)-dJ{v),u-v), (2.15) 

Vw,w€i?™, /3>0. In 3] it was shown that, if (|2.15l) is true, then both of the sequences (u k )kt£N and (p h )keN 
denned by (|2.10|) converge for < S < p^mi • 

In [J] the authors recently give a theoretical analysis, showing that the iteration in (|2 . 1 1 1) and (12 . 12[) 
converges to the unique solution of 

min| A1 || U || 1 + l|| U || 2 :A U = /| (2.16) 

They also show the interesting result: let S be the set of all solutions of the Basis Pursuit problem (11.11) . 
(TO|) and let 

Mi = argmin ||u|| 2 (2-17) 
which is unique. Denote the solution of (12.161) as u* . Then 

lim [[tt*-in||=0. (2.18) 

In passing they show that 

IKII < IKH for all n>0 (2.19) 

which we will use below. 

Another theoretical analysis on Linearized Bregman algorithm is given by Yin in [14], where he shows 
that Linearized Bregman iteration is equivalent to gradient descent applied to the dual of the problem (|2.16[) 
and uses this fact to obtain an elegant convergence proof. 

This summarizes the relevant convergence analysis for our Bregman and linearized Bregman models. 

Next we recall some results from [7J regarding noise and Bregman iteration. 

For any sequence {u k },{p k } satisfying (|2.7|l for J continuous and convex, we have, for any jj, 

D5^fi,« fc )-I> J p fc -Hfi J « fc - 1 )<(Afi-/,A«*- 1 -/)-||A« fc - 1 -/[[ a . (2.20) 

Besides implying that the Bregman distance between u k and any element u satisfying Au — f is mono- 
tonically decreasing, it also implies that, if u is the "noise free" approximation to the solution of (jl.lj) . the 
Bregman distance between u k and u diminishes as long as 

\\Au k — /|| > 1 1 -Au— /|| =(T, where a is some measure of the noise (2.21) 

i.e., until we get too close to the noisy signal in the sense of (|2 . 2 1 [) . Note, in we took A to be the identity, 
but these more general results are also proven there. This gives us a stopping criterion for our denoising 
algorithm. 

In [7J we obtained a result for linearized Bregman iteration, following [15| , which states that the Bregman 
distance between u and u k diminish as long as 

\\Au~f\\<{l-25\\AA T \\) \\Au k -f\\ (2.22) 

so we need 0<25||AA T || < 1. 

In practice, we will use (|2.21j) as our stopping criterion. 



Stanley Osher, Yu Mao, Bin Dong, Wotao Yin 



5 



3 Convergence 

We begin with the following simple results for the linearized Bregman iteration or the equivalent algorithm 
(12~TU1) . 

Theorem 3.1. Ifu k ^u°°, then Au°° = f . 

Proof. Assume Au°° ^ / '. Then A T (Au°° - /) ^ since A T has full rank. This means that for some i, 
(A T (Au k — f))i converges to a nonzero value, which means that v k+1 — v k does as well. On the other hand 
{v k } — {u k /S +p k } is bounded since {u k } converges and p fc e[— Therefore {v k } is bounded, while 
v k+1 —vf converges to a nonzero limit, which is impossible. □ 

Theorem 3.2. Ifu k ^u°° and v k — ► v°° , then u 00 minimizes {J(u) + -^\\u\\ 2 : Au = f} . 
Proof. Let J{u) = J{u) + ^ ||u|| 2 . then 

dJ(u) = dJ(<u) + -u. 

o 

Since dJ(u k )=p k = v k — u k /5, we have dJ(u k ) = v k . Using the non-negativity of the Bregman distance we 
obtain 

J(u k ) < J(u opt ) - (u opt -u k ,dJ(u k )) 

= J (""opt) - ("opt ~U k ,V k ) 

where u opt minimizes (jl.ll) with J replaced by J, which is strictly convex. 
Let /c — » oo, we have 

J(u°°)< JKptM^opt^ 00 ^ 00 ) 

Since v k — A t ^2^q A T (f — Au^), we have v°° 6 range(A T ). Since Au op t — Au 00 — f, we have (u op t — 
u ao ,v ca )=Q, which implies J(u°°) < J{u opt ). □ 

Equation (|2.16|) (from a result in [3j ) implies that u°° will approach a solution to (jl.ljl . (|1.2j) . as /i 
approaches oo. 

The linearized Bregman iteration has the following monotonicity property: 
Theorem 3.3. If u k+1 ^ it fc and < 5 < 2/||AA- r ||, then 

||A,/ +1 -/||< \\Au k -f\\. 

Proof. Let 

u k+i _ u k = Aw fc _ w fc+i _ v fc _ Ay k 
Then the shrinkage operation is such that 

Av k = Sq k Av k (3.23) 

with 

f = l ifw 4 fc+1 wf>0 
g*<=0 iiu^ +1 = u k = 
Is (0,1] otherwise 

Let Q fc = Diag (q k ). Then (|3T23]) can be written as 

Au k = 6Q k Av k = 5Q k A T (f - Au k ) (3.24) 

which implies 

Au k+1 -f=(I-6AQ k A T )(Au k -f). (3.25) 

From (|3~23l) . Q fc is diagonal with X Q fc ^ 7, so ^ Ag fc A T ^ AA T . If we choose 5 > such that MA T -< 
27, then Q<5AQ k A T <2I or -7 -< I- SAQ k A T < I which implies that ||^4.xi fc — 1 1 is not increasing. To get 
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strict decay, we need only show that AQ k A T (Au k — /) =0 is impossible if u k+1 ^ u k . Suppose AQ k A T (Au k — 
/) = holds, then from (I3.24|) we have: 

(Au k ,Av k )=5(A T (f-Au k ),Q k A T (f~Au k )}=0. 

By (|3.23j) . this only happens if u k+1 — u k for all i, which is a contradiction. □ 

We are still faced with estimating how fast the residual decays. It turns out that if consecutive elements 
of u do not change sign, then ||Au— /|| decays exponentially. By 'exponential' we mean that the ratio of the 
residuals of two consecutive iteration converges to a constant, this type of convergence is sometimes called 
linear convergence. Here we define 

S u = {x E R n : signfc) = sign(u;)>Vi} (3.26) 
(where sign(0) = and sign(a) = a/\a\ for a=/=0). Then we have the following: 

Theorem 3.4. If u k E S = S Uk for k E (Ti,T2), thenu k converges ton*, where u* E argmin{||Au — f\\ 2 : u E S} 
and \\Au k — f\\ 2 decays to \\Au* — f\\ 2 exponentially. 

Proof. . Since u k E S for k E [T l} T 2 ], we can define Q = Q k for T x < k < T 2 - 1. From $35§ we see that Q k is 
a diagonal matrix consisting of zeros or ones, so Q = Q T Q. Moreover, it is easy to see that S = {x\Qx = x}. 
Following the argument in Theorem 13.31 we have: 



u k+i _ u k^ Au k = S Q Av k = SQA T (f - Au k ) (3.27) 
Au k+1 -f=[I-5AQA T ]{Au k -f) (3.28) 

and 

-I<I-bAQA T <I. 

Let R n = Vo ® Vi , where Vb is the null space of AQA T and V\ is spanned by the eigenvectors corresponding 
to the nonzero eigenvalues of AQA T . Let Au k — f ~w k,a + w k ' 1 , where w k ' : ' E Vj for j — 0,1. From (|3.28[) we 
have 

w k +h o = w k,o 



w 



= [I-SAQA^w"- 1 



for T l <k<T 2 -l. Since w k ' 1 is not in the null space of AQA T , then (EH?) and ([3T2"8"f imply that 
decays exponentially. Let w° = w k '°, then AQA T w° = AQQA T w°^QA T w°=0. Therefore, from (|3T27|) 
we have 

Au k = SQ T A T (f - Au k ) = SQA T (w° + w M ) = SQA T w k ' 1 . 

Thus 1 1 Ait || decays exponentially. This means {u k } forms a Cauchy sequence in S, so it has a limit u* E S. 
Moreover 

Au*-f = \im(Au k - /) = limw fe <° +limw k ' 1 = w°. 

k k k 

Since Vq and V\ are orthogonal: 

||Au fc -/|| 2 = ||^°|| 2 + |h M || 2 = ||A U *-/|| 2 + ||^ 1 || 2 , 

so \\Au k — /|| 2 — || Au* — /|| 2 decays exponentially. The only thing left to show is that 

u* = argmin(||ylii — /|| 2 : u E S) = argmin{|| Au — f\\ 2 : Qu = u}. 

This is equivalent to way that A T {Au* — f) is orthogonal with the hyperspace {u:Qu — u}. It's easy to see 
that since Q is a projection operator, a vector v is orthogonal with {u : Qu = u} if and only if Qv = 0, thus we 
need to show QA T (Au* — f ) = 0. This is obvious because we have shown that Au* — f = w° and QA T w° = 0. 
So we find that u* is the desired minimizer. □ 
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Therefore, instead of decaying exponentially with a global rate, the residual of the linearized Bregman 
iteration decays in a rather sophisticated manner. From the definition of the shrinkage function we can see 
that the sign of an element of u will change if and only if the corresponding element of v crosses the boundary 
of the interval [— If /i is relatively large compared with the size of At> (which is usually the case when 
applying the algorithm to a compressed sensing problem) , then at most iterations the signs of the elements 
of u will stay unchanged, i.e. u will stay in the subspace S u defined in (|3.26p for a long while. This theorem 
tells us that under this scenario u will quickly converge to the point u* that minimizes — /|| inside S u , 
and the difference between — f\\ and ||Ait* — /|| decays exponentially. After u converges to u*, u will 
stay there until the sign of some element of u changes. Usually this means that a new nonzero element of u 
comes up. After that, u will enter a different subspace S and a new converging procedure begins. 

The phenomenon described above can be observed clearly in Fig [TJ The final solution of u contains 
five non-zero spikes. Each time a new spike appears, it converges rapidly to the position that minimizes 
||ykt — /|| in the subspace S u . After that there is a long stagnation, which means u is just waiting there until 
the accumulating v brings out a new non-zero element of u. The larger /i is, the longer the stagnation takes. 
Although the convergence of the residual during each phase is fast, the total speed of the convergence suffers 
much from the stagnation. The solution of this problem will be described in the next section. 




10 20 30 40 50 60 70 80 

iteration 



Figure 1: The left figure presents a simple signal with 5 non-zero spikes. The right figure shows how the 
linearized Bregman iteration converges. 



4 Fast Implementation 

The iterative formula in Algorithm [1] below gives us the basic linearized Bregman algorithm designed to 
solve (fi~il.(fi~2i 



Algorithm 1 Bregman Iterative Regularization 
Initialize: u = 0, v = 0. 
while "II/ — Au\\ not converge" do 

v k+1 =v k +A T (f-Au k ) 

u k+1 =(5 •shrink(w fe+1 ,/i) 
end while 



This is an extremely concise algorithm, simple to program, involve only matrix multiplication and shrink- 
age. When A consists of rows of a matrix of a fast transform like FFT which is a common case for compressed 
sensing, it is even faster because matrix multiplication can be implemented efficiently using the existing fast 
code of the transform. Also, storage becomes a less serious issue. 
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We now consider how we can accelerate the algorithm under the problem of stagnation described in 
the previous section. From that discussion, during a stagnation u converges to a limit u* so we will have 
u k+1 « u k+2 « • • • w w fc + m pa u* for some m. Therefore the increment of v in each step, A T (f — Au), is fixed. 
This implies that during the stagnation u and v can be calculated explicitly as following 



,k+j : 



,k+l 



, } k+j = v k +j.A T (f- Au k+1 ) 



i=i,' 



, m 



(4.29) 



If we denote the set of indices of the zero elements of u* as Iq and let I\ = Iq be the support of u*, then v\ 
will keep changing only for i e Iq and the iteration can be formulated entry- wise as: 



u k+j= k+1 



Vz 



v* +j = v k +j ■ (A T (/ - i e I 

ie.Ii 



k+j _ fe+i 



(4.30) 



for j' = l,---,m. The stagnation will end when u begins to change again. This happens if and only if some 
element of v in Iq (which keeps changing during the stagnation) crosses the boundary of the interval [— fx,fj]. 
When ielo, vf £ [—fj,,fj], so we can estimate the number of the steps needed for v h to cross the boundary 
Vi e Iq from (|4.30p , which is 



^•sign((A T (/~A^+ 1 )) i )- 
(A-r(f-Au^))i 



„fe+i 



ViG In 



(4.31) 



and 



s = minjsi} 

ie/ 



(4.32) 



is the number of steps needed. Therefore, s is nothing but the length of the stagnation. Using (|4.29[) . we 
can predict the end status of the stagnation by 



u k+s = u k+1 

„.h-\-s — ^ ,k 



= v k + s-A T (f-Au k+1 ) 



(4.33) 



Therefore, we can kick u to the critical point of the stagnation when we detect that u has been staying 
unchanged for a while. Specifically, we have the following algorithm: Algorithm [2] 



Algorithm 2 Linearized Bregman Iteration with Kicking 
Initialize: u = 0, v = 0. 
while "II/ — Au\\ not converge" do 
if "u^ku^ then 

calculate s from (|4~3T|) and (|4~32|) 
v k+1 =v<? + s- (A T (/ - Au k )) z ,VieIo 
v k+1 =v k , VieJi 
else 

u fc + 1 = t ; fc + yl T (/-ylu fe ) 

end if 

u k+1 = S ■ shrmk(v k+1 , fj.) 
end while 



Indeed, this kicking procedure is similar to line search commonly used in optimization problems and 
modifies the initial algorithm in no way but just accelerates the speed. More precisely, note that the 
output sequence {u k ,v k } is a subsequence of the original one, so all the previous theoretical conclusions on 
convergence still hold here. 

An example of the algorithm is shown in Fig[2] It is clear that all the stagnation in the original convergence 
collapses to single steps. The total amount of computation is reduced dramatically. 
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Figure 2: The left figure presents the convergence curve of the original linearized Bregman iteration using 
the same signal as FigQ] The right figure shows the convergence curve of the linearized Bregman iteration 
with the kicking modification. 



5 Numerical Results 

In this section, we demonstrate the effectiveness of the algorithm (with kicking) in solving basis pursuit and 
some related problems. 

5.1 Efficiency 

Consider the constrained minimization problem 

min|u|i s.t. Au = f, 

where the constraints Au — f are under-determined linear equations with ianmxn matrix, and / generated 
from a sparse signal u that has a number of nonzeros n<m. 

Our numerical experiments use two types of A matrices: Gaussian matrices whose elements were gen- 
erated from i.i.d. normal distributions 7V(0,1) (randn(m,n) in MATLAB), and partial discrete cosine 
transform (DCT) matrices whose k rows were chosen randomly from the n x n DCT matrix. These matrices 
are known to be efficient for compressed sensing. The number of rows m is chosen as m^n\og(n/K) for 
Gaussian matrices and m~«logn for DCT matrices (following [H| )■ 

The tested original sparse signals u had numbers of nonzeros equal to 0.05n and 0.02n rounded to the 
nearest integers in two sets of experiments, which were obtained by round(0.05*n) and round(0.02*n) in 
MATLAB, respectively. Given a sparsity ||w||o, i-e., the number of nonzeros, an original sparse signal uGR™ 
was generated by randomly selecting the locations of ||u||o nonzeros, and sampling each of these nonzero 
elements from U(— 1,1) (2*(rand-0.5) in MATLAB). Then, / was computed as Au. When \\u\\o is small 
enough, we expect the basis pursuit problem, which we solved using our fast algorithm, to yield a solution 
u* = u from the inputs A and /. 

Note that partial DCT matrices are implicitly stored fast transforms for which matrix-vector multipli- 
cations in the forms of Ax and A 1 x were computed by the MATLAB commands dct(x) and idct(x), 
respectively. Therefore, we were able to test on partial DCT matrices of much larger sizes than Gaussian 
matrices. The sizes m-by-n of these matrices are given in the first two columns of Table [TJ 

Our code was written in MATLAB and was run on a Windows PC with a Intel(R) Core(TM) 2 Duo 
2.0GHz CPU and 2GB memory. The MATLAB version is 7.4. 

The set of computational results given in Table [T] was obtained by using the stopping criterion 



(5.34) 
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which was sufficient to give a small error \\u k — u||/||u||. Throughout our experiments in Table [TJ we used 
/i = 1 to ensure the correctness of the results. 



Table 1: Experiment results using 10 random instances for each configuration of (m,n, ||u||o), with nonzero 
elements of u come from U{— 1,1). 



Results of linearized Bregman-Li with kicking 

T7T 



Stopping tolerance. 



p u fe -/||/||/||<io- 





Gaussian matrices 






stopping itr. k 


relative error \\u k 




time (sec.) 






mean 


std. 


max 


mean 


std. 


max 


mean 


std. 


max 


n 


m 








IM|o = 0.05 


n 








1000 


300 


422 


67 


546 


2.0e-05 


4.3e-06 


2.7e-05 


0.42 


0.06 


0.51 


2000 


600 


525 


57 


612 


1.8e-05 


1.9e-06 


2.1e-05 


4.02 


0.45 


4.72 


4000 


1200 


847 


91 


1058 


1.7e-05 


1.7e-06 


1.9e-05 


25.7 


2.87 


32.1 


n 


m 








No=0.02 


n 








1000 


156 


452 


98 


607 


2.3e-05 


2.6e-06 


2.6e-05 


0.24 


0.06 


0.33 


2000 


312 


377 


91 


602 


2.0e-05 


4.0e-06 


2.9e-05 


1.45 


0.38 


2.37 


4000 


468 


426 


30 


477 


1.6e-05 


2.1e-06 


2.0e-05 


6.96 


0.51 


7.94 




Partial DCT matrices 


n 


m 








IM|o = 0.05 


n 








4000 


2000 


71 


6.6 


82 


9.1e-06 


2.5e-06 


1.2e-05 


0.43 


0.06 


0.56 


20000 


10000 


158 


14.5 


186 


6.2e-06 


2.1c-06 


l.lc-05 


3.95 


0.36 


4.73 


50000 


25000 


276 


14 


296 


6.8e-06 


2.6e-06 


1.0e-05 


17.6 


0.99 


19.2 


n 


m 


||u|| = 0.02n 


4000 


1327 


52 


7.0 


64 


8.6e-06 


1.3e-06 


l.le-05 


0.27 


0.04 


0.35 


20000 


7923 


91 


10.3 


115 


7.2e-06 


2.2e-06 


l.le-05 


2.36 


0.30 


3.02 


50000 


21640 


140 


9.7 


153 


5.9e-06 


2.4c-06 


l.le-05 


8.53 


0.66 


9.42 



5.2 Robustness to Noise 

In real applications, the measurement / we obtain is usually contaminated by noise. The measurement we 
have is: 

/ = f + n = Au + n, neAf(0,a). 

To characterize the noise level, we shall use SNR (signal to noise ratio) instead of a itself. The SNR is 
defined as follows 

Hull 

SNR(u)~20\og w CjJ-). 

In this section we test our algorithm on recovering the true signal u from A and the noisy measurement 
/. As in the last section, the nonzero entries of u are generated from U{— 1,1), and A is either a Gaussian 
random matrix or a partial DCT matrix. Our stopping criteria is given by 

std(Au k -f)<a, and Iter. < 1000, 

i.e. we stop whenever the standard deviation of residual Au k — f is less than a or the number of iterations 
exceeds 1000. Table [2] shows numerical results for different noise level, size of A and sparsity. We also show 
one typical result for a partial DCT matrix with size n = 4000 and ||u||o = 0.02n = 80 in Figure |3] 
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Table 2: Experiment results using 10 random instances for each configuration of (m,n, ||u||o). 



Results of linearized Bregman-Li with kicking 



Stopping criteria. 


std(Au k -f)<a. 




Gaussian matrices 






stopping itr. k 


relative 


error ||u fe — u||/||it|| 


time (sec.) 






mean 


std. 


max 


mean 


std. 


max 


mean 


std. 


max 


Avg. SNR 


(n,m) 




u||q = 0.05n 


26.12 


(1000,300) 


420 


95 


604 


0.0608 


0.0138 


0.0912 


0.33 


0.09 


0.53 


25.44 


(2000,600) 


206 


32 


253 


0.0636 


0.0128 


0.0896 


1.49 


0.22 


1.79 


26.02 


(4000,1200) 


114 


11 


132 


0.0622 


0.0079 


0.0738 


3.32 


0.31 


3.81 


Avg. SNR 


(n,m) 


1 


u|| = 0.02n 


27.48 


(1000,156) 


890 


369 


1612 


0.0456 


0.0085 


0.0599 


0.42 


0.17 


0.73 


25.06 


(2000,312) 


404 


64 


510 


0.0638 


0.0133 


0.0843 


1.37 


0.23 


1.74 


26.04 


(4000,468) 


216 


35 


267 


0.0557 


0.0068 


0.0639 


3.29 


0.55 


4.13 




Partial DCT matrices 


Avg. SNR 


(n,m) 


1 


u||o = 0.05n 


23.97 


(4000, 2000) 


151 


9.2 


170 


0.0300 


0.0028 


0.0332 


0.94 


0.07 


1.03 


24.00 


(20000,10000) 


250 


14 


270 


0.0300 


0.0010 


0.0318 


7.88 


0.62 


8.86 


24.09 


(50000,25000) 


274 


9.9 


295 


0.0304 


0.0082 


0.0315 


20.4 


0.74 


20.1 


Avg. SNR 


(n,m) 


1 


u|| = 0.02n 


24.29 


(4000,1327) 


130 


11 


157 


0.0223 


0.0023 


0.0253 


0.79 


0.08 


1.00 


24.37 


(20000,7923) 


223 


14 


257 


0.0204 


0.0025 


0.0242 


6.89 


0.53 


8.15 


24.16 


(50000,21640) 


283 


19 


311 


0.0193 


0.0012 


0.0207 


21.5 


1.68 


24.1 



5.3 Recovery of Signal with High Dynamical Range 

In this section, we test our algorithm on signals with high dynamical ranges. Precisely speaking, let 
MAX = max{|?2j| :i — l,...,n} and MIN = min{|itj| : Ui ^0,i = l,...,n}. The signals we shall consider here sat- 
isfy Trjm^r ~ 10 10 . Our u is generated by multiplying a random number in [0,1] with another one randomly 
picked from {1,10,...,10 10 }. Here we adopt the stopping criteria 

H^-/ll <10 -n 



ll/l 

for the case without noise (Figure 0]) and the same stopping criteria as in the previous section for the noisy 
cases (Figures [5][7l) . In the experiments, we take the dimension n = 4000, the number of nonzeros of u to 
be 0.02n, and /j,= 10 10 . Here \i is chosen to be much larger than before, because the dynamical range of u 
is large. Figure 2] shows results for the noise free case, where the algorithm converges to a 10 -11 residual 
in less than 300 iterations. Figures [5][7] show the cases with noise (the noise is added the same way as in 
previous section). As one can see, if the measurements are contaminated with less noise, signals with smaller 
magnitudes will be recovered well. For example in Figure [5j the SNRwll8, and the entries of magnitudes 
10 4 are well recovered; in Figure El the SNR«97, and the entries of magnitudes 10 5 are well recovered; and 
in Figure the SNRw49, and the entries of magnitudes 10 7 are well recovered. 

5.4 Recovery of Sinusoidal Waves in Huge Noise 

In this section we consider 

u(t) = asm(at) + bcos((3t), 

where a, b, a and j3 are unknown. The observed signal u is noisy and has the form u = u + n with n~A/"(0,<r). 
In practice, the noise in u could be huge, i.e. possibly have a negative SNR, and we may only be able 
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Noisy Measurements, SNR = 23.1084 Iterations = 102, Final Error = 0.020764 




500 1000 1000 2000 3000 4000 



Figure 3: The left figure presents the clean (red dots) and noisy (blue circles) measurements, with 
SNR=23.1084; the right figure shows the reconstructed signal (blue circles) v.s. original signal (red dots), 
where the relative error=0. 020764, and number of iterations is 102. 

to observe partial information of u, i.e. only a subset of values of u is known. Notice that the signal is 
sparse (only four spikes) in frequency domain. Therefore, this is essentially a compressed sensing problem 
and ^i-minimization should work well here. Now the problem can be stated as reconstructing the original 
signal u from random samples of the observed signal u using our fast £i-minimization algorithm. In our 
experiments, the magnitudes a and b are generated from U{— 1, 1); frequencies a and /3 are random multiples 
of — , i.e. a — ki^- and a = ^2^7, with ki taken from {0,1,. . . ,n — 1} randomly and n denotes the dimension. 
We let I be a random subset of {1,2, ...,n} and f = u(I), and take A and A T to be the partial matrix of 
inverse Fourier matrix and Fourier matrix respectively. Now we perform our algorithm adopting the same 
stopping criteria as in section 15721 and obtain a reconstructed signal denoted as x. Notice that reconstructed 
signal x is in Fourier the domain, not in the physical domain. Thus we take an inverse Fourier transform 
to get the reconstructed signal in physical domain, denoted as u*. Since we know a priori that our solution 
should have four spikes in Fourier domain, before we take the inverse Fourier transform, we pick the four 
spikes with largest magnitudes and set the rest of the entries to be zero. Some numerical results are given in 
Figure [S ill! Our experiments show that the larger the noise level is, the more random samples we need for 
a reliable reconstruction, where reliable means that with high probability (>80%) of getting the frequency 
back exactly. As for the magnitudes a and b, our algorithm cannot guarantee to recover them exactly (as 
one can see in Figure [8l llip . However, frequency information is much more important than magnitudes in 
the sense that the reconstructed signal is less sensitive to errors in magnitudes than errors in frequencies 
(see bottom figures in Figure ISITTTI) . On the other hand, once we recover the right frequencies, one can use 
hardware to estimate magnitudes accurately. 

6 Conclusion 

We have proposed the linearized Bregman iterative algorithms as a competitive method for solving the 
compressed sensing problem. Besides the simplicity of the algorithm, the special structure of the iteration 
enables the kicking scheme to accelerate the algorithm even when [i is extremely large. As a result, a sparse 
solution can always be approached efficiently. 

It also turns out that our process has remarkable denoising properties for undersampled sparse signals. 
We will pursue this in further work. 

Our results suggest there is a big category of problem that can be solved by linearized Bregman iterative 
algorithms. We hope that our method and its extensions could produce even more applications for problems 
under different scenarios, including very underdetermined inverse problems in partial differential equations. 
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Figure 4: Upper left, true signal (red dots) v.s. recovered signal (blue circle); upper right, one zoom-in to 
the lower magnitudes; lower left, decay of residual log 10 "^"jjyjp^ j lower right, decay of error to true solution 
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Figure 5: Noisy case. Left figure, true signal (red dots) v.s. recovered signal (blue circle); right figure, one 
zoom-in to the magnitudes 10 5 . The error is measured by ^^n^jpH . 
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Figure 6: Noisy case. Left figure, true signal (red dots) v.s. recovered signal (blue circle); right figure, one 
zoom-in to the magnitudes 10 6 . The error is measured by ^prr 1 ^- 
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SNR = 49.289, Iter = 90, Error = 0.00067413 
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Figure 7: Noisy case. Left figure, true signal (red dots) v.s. recovered signal (blue circle); right figure, one 
zoom-in to the magnitude^ 10 8 . The error is measured by ^",^,"^1 . 
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Orignal and Noisy Waves, SNR = 2.6185 
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Figure 8: Reconstruction using 20% random samples of u with SNR= 2.6185. The upper left figure shows 
the original (red) and noisy (blue) signals; the upper right shows the reconstruction (blue circle) v.s. original 
signal (red dots) in Fourier domain in terms of their magnitudes (i.e. \u*\ v.s. u|); bottom left shows the 
reconstructed (blue) v.s. original (red) signal in physical domain; and bottom right shows one close-up of 
the figure at bottom left. 
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Orignal and Noisy Waves, SNR = -4.7836 
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Figure 9: Reconstruction using 40% random samples of u with SNR= —4.7836. The upper left figure shows 
the original (red) and noisy (blue) signals; the upper right shows the reconstruction (blue circle) v.s. original 
signal (red dots) in Fourier domain in terms of their magnitudes (i.e. \u*\ v.s. u|); bottom left shows the 
reconstructed (blue) v.s. original (red) signal in physical domain; and bottom right shows one close-up of 
the figure at bottom left. 
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Orignal and Noisy Waves, SNR = -6.7908 




150 



100 



Frequency Domain 




200 400 



500 



Physical Domain 



One Zoom-In 




Figure 10: Reconstruction using 60% random samples of u with SNR= —6.7908. The upper left figure shows 
the original (red) and noisy (blue) signals; the upper right shows the reconstruction (blue circle) v.s. original 
signal (red dots) in Fourier domain in terms of their magnitudes (i.e. v.s. \u\); bottom left shows the 
reconstructed (blue) v.s. original (red) signal in physical domain; and bottom right shows one close-up of 
the figure at bottom left. 
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Orignal and Noisy Waves, SNR = -1 1 .001 6 
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Figure 11: Reconstruction using 80% random samples of u with SNR= —11.0016. The upper left figure 
shows the original (red) and noisy (blue) signals; the upper right shows the reconstruction (blue circle) v.s. 
original signal (red dots) in Fourier domain in terms of their magnitudes (i.e. \u*\ v.s. \u\); bottom left 
shows the reconstructed (blue) v.s. original (red) signal in physical domain; and bottom right shows one 
close-up of the figure at bottom left. 



